Skip to content

Proposed schema changes.#132

Open
bcli4d wants to merge 8 commits intomasterfrom
idc_v24
Open

Proposed schema changes.#132
bcli4d wants to merge 8 commits intomasterfrom
idc_v24

Conversation

@bcli4d
Copy link
Copy Markdown
Member

@bcli4d bcli4d commented Mar 18, 2026

No description provided.

BillClifford added 3 commits March 17, 2026 17:44
… into idc_v24

# Conflicts:
#	bq/generate_tables_and_views/auxiliary_metadata_table/schema.py
#	bq/generate_tables_and_views/original_collections_metadata/schema.py
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Didn't we agree to call this "collection_title" for consistency? Also, I would prefer to follow the convention of no-caps and underscore separator for all column names that are outside of DICOM, and CamelCase for those in DICOM. I do understand that capitalization is cosmetic and not recognized in SQL, but I think these conventions help with readability.

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

collection_title brings up analysis_results_metadata, which also has ID and Title columns. You suggested analysis_collection_id, analysis_collection_title, and analysis_collection_name. I will add the analysis_results_metadata to the PR with those changes.

I'll take out the camels.

With respect to lower-casing where there is CamelCase, e.g. Subjects to subjects, I cannot deprecate the upper case version since BQ will regard the upper case and lower case column names as being the same. Just pointing this out.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why not just rename it to lower-case, since as you said this will have zero effect on the queries?

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's what I was trying to say.

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The sources_of_original_data[ID] should probably be sources_of_original_data[collection_name], and
related_analysis_results[ID] should be related_analysis_results[analysis_collection_name]

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we never use [collection|analysis_result]_name for references - I think those should be [collection|analysis_result]_id.

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Revised

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As above - I would suggest "cancer_types".

bigquery.SchemaField('Species', 'STRING', mode='REQUIRED', description="Species of collection subjects"),
bigquery.SchemaField(
"Sources",
"Sources_of_original_data",
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am confused about the name. "sources_of_original_data", but then the record says "original or analysis results"? Why not keep "sources" as originally?

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's a bug. I'll correct the description. Thanks.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants